Tests on the Accuracy and Scalability of the Full-Potential DFT Method Based on Multiple Scattering Theory

We investigate a reduced scaling full-potential DFT method based on the multiple scattering theory (MST) code MuST, which is released online (https://github.com/mstsuite/MuST) very recently. First, we test the accuracy by calculating structural properties of typical body-centered cubic (BCC) metals (V, Nb, and Mo). It is shown that the calculated lattice parameters, bulk moduli, and elastic constants agree with those obtained from the VASP, WIEN2k, EMTO, and Elk codes. Second, we test the locally self-consistent multiple scattering (LSMS) mode, which achieves reduced scaling by neglecting the multiple scattering processes beyond a cut-off radius. In the case of Nb, the accuracy of 0.5 mRy/atom can be achieved with a cut-off radius of 20 Bohr, even when small deformations are imposed on the lattice. Despite that the calculation of valence states based on MST exhibits linear scaling, the whole computational procedure has an overall scaling of about O(N1.6), due to the fact that the updating of Coulomb potential scales almost as O(N2). Nevertheless, it can be still expected that MuST would provide a reliable and accessible way to large-scale first-principles simulations of metals and alloys.


INTRODUCTION
Kohn-Sham density functional theory (KS-DFT) (Kohn and Sham, 1965) transforms the many-body problem to a non-interacting system and has been widely used in modern first-principles calculations. Although many computational schemes are developed to solve the Kohn-Sham equation (Kohn and Sham, 1965) for the ground-state properties, the Korringa-Kohn-Rostoker Green's function (KKR-GF) method (Korringa, 1947;Kohn and Rostoker, 1954), also known as multiple scattering theory (MST), provides equivalent information by the single-particle GF (Economou, 2006). In the MST approach, the system is divided into non-overlapping atomic regions as a set of scatterers. To solve the single-site scattering problem, one numerically determines the angular momentum and energy-dependent solutions of the radial Schrödinger equation for a given potential. The coherent matching of the single-site solutions can be achieved if and only if the incoming wave of an atomic site is identical to the superposition of the outgoing waves from all other scatterers. This viewpoint not only gives access to the Kohn-Sham eigenstates but also to the single-electron GF of the system, which leads to the modern KKR-GF method.
The survey (Aarons et al., 2016) suggests that KKR-GF or MST method remains important for large-scale metallic systems. The favorable scaling in MST is attributed to the fact that the electron density, which is the fundamental quantity in DFT, can be obtained from the site-diagonal blocks of the scattering path matrix. And the site-diagonal block of the scattering path matrix for a particular atom can be solved with sufficient accuracy by considering only the electronic multiple scattering processes in a finite-sized region centered at this atom. This region is referred to as the local interaction zone (LIZ), which is originally introduced in the locally self-consistent multiple-scattering method (LSMS) (Wang et al., 1995). Base on the central idea of the LSMS method, the locally self-consistent GF (LSGF) approach (Abrikosov et al., 1996(Abrikosov et al., , 1997 can choose judiciously the effective medium to decrease the LIZ size. In particular, the linear scaling has been achieved in LSMS with muffin-tin approximation and in LSGF with tight-binding linear muffin-tin orbital (TB-LMTO) basis. It should be mentioned that besides the MST-based methods, other approaches to reduced scaling DFT methods for metallic systems have also been developed in recent years Pratapa et al. (2016), Suryanarayana et al. (2018), Aarons and Skylaris (2018), Mohr et al. (2018).
There is a trend toward the full-potential (FP) MST in which no shape approximation is assumed for the potential. Many questions in materials science, for example, on complex defects, interfaces, dislocations, as well as nanostructures, come to a great demand for the reduced scaling FP method. KKRnano, a massively parallel DFT package based on MST, has been developed and optimized for thousands of atoms without a compromise on the FP accuracy . And this package has been applied to study the role of the vacancy clusters in metal-insulator transitions (Zhang et al., 2012).
However, most MST simulation packages are in-house, which impedes the application of MST as a powerful tool for large scale or disordered systems. Recently, the MuST package, an ab initio calculation software package based on FP MST (Rusanu et al., 2011), is open to public and is free to download online (https://github.com/mstsuite) under a BSD 3-clause license. We focus on the MST part in the MuST package, which not only provides features for calculating physical properties of materials but also serves as a platform for implementing and testing the numerical algorithms. At present, the MuST package is capable of performing the following calculations: (1) muffin-tin approximation, (2) FP method, (3) coherent potential approximation (CPA), and (4) LSMS method. And the fully relativistic MST by solving the Dirac equation has been implemented in MuST Liu et al. (2016. For such a newly released package, it is prerequisite to perform systematic tests both on the accuracy and efficiency. A reliable FP method can be used to exactly capture the small energy difference for the lattice distortion or deformation. According to the elastic theory, we deform the crystalline cell to the distorted lattice structures and then calculate their energies. The small energy change with the lattice deformation can be used to calculate the elastic constants (Vitos, 2007). Asato et al. (1999) investigated total energy calculations for metals and semiconductors based on the FP MST method. But few work pay attention to validate the elastic properties based on MST, which is fundamental for applying MST to study the structural properties of materials. Considering the anomalies behavior of deformations in body-centered cubic (BCC) V and Nb (Nagasako et al., 2010;Dezerald et al., 2016), we employ the different ab initio methods including the FP MST method in MuST package to calculate the elastic constants of V, Nb, and Mo. By comparing with results of available experiments and other popular firstprinciples packages, we investigate the accuracy of the FP MST method in MuST package.
To estimate the parallel scalability, we carried out strong and weak scaling tests of the FP LSMS method in MuST package. It is seen that the LSMS method exhibits a good strong scalability. This is due to the two-level parallelism over atoms and energy points implemented in MuST package. However, in the weak scaling test, the overall computational procedure is not linear scaling, which seems to be inconsistent with the O(N) scaling of the muffin-tin LSMS proved in previous work (Wang et al., 1995). By analyzing the implementation scheme, we attribute it to the difference in updating the Coulomb potential between the muffin-tin approximation and the FP method. While the solution of eigenvalue problems is avoided in the MST method, the calculation of Coulomb potential could become the performance bottleneck in large-scale first-principles simulations. For example, PRinceton Orbital-Free Electronic Structure Software (PROFESS) (Ho et al., 2008;Hung et al., 2010;Chen et al., 2015) suggested that about 70% of computation time was spent on fast Fourier transforms (FFTs) to calculate the kinetic and electron-electron Coulomb interaction terms. PROFESS features plane-wave basis set and has been optimized for peta-scale computing (Chen et al., 2016). The calculation of MST is based on angular momentum expansion and new algorithms should be developed to optimize the overall scaling of the FP MST method.
The rest of this paper is arranged as follows. In section 2, we introduce the MST method and its LSMS variant. In section 3, we investigate the accuracy by calculating the equation of state (EOS) and elastic constants of typical BCC metals. In section 4, we test and analyze the scalability of the FP LSMS method. In section 5, some concluding remarks are drawn.

MST Method
The term MST method in this manuscript refers to the modern version of the KKR method, that is, the KKR-GF method. The central quantity to be computed in the MST method changes from the Kohn-Sham orbitals in band theory methods to the one-electron GF, which can be defined as solutions of the following differential equation (Economou, 2006) (nonspin polarized cases assumed and atomic unitsh = 1 and m e = 1/2 used): where V eff is the Kohn-Sham effective potential under exchangecorrelation approximations like the local density approximation (LDA) or the generalized gradient approximation (GGA), ρ(r) is the electron density, and z ≡ ǫ + ıη is a complex variable. If z is real and ǫ belongs to the continuous spectrum of −∇ 2 + V eff [ρ], G(r, r ′ ; ǫ) is not well-defined and one may define the retarded GF In the following, the superscript + will be omitted. Once the GF is known, the valence electron density can be obtained by (Gonis and Butler, 2000;Economou, 2006;Faulkner et al., 2018) where ǫ F is the Fermi energy, the bottom energy ǫ B is chosen between the highest core-state energy and the valence band, and the factor 2 accounts for the number of electron spins. The energy integration in Equation (3) can be carried out along a contour in the complex energy plane so that only few tens of energy points are needed. Other physical quantities like the density of states (DOS) can also be obtained from the GF (Gonis and Butler, 2000;Economou, 2006;Faulkner et al., 2018).
The MST method provides a convenient access to the GF. In the MST method, atoms in the system are considered as scattering centers of which the scattering properties are described by the so-called single-site scattering t-matrix (Gonis and Butler, 2000;Faulkner et al., 2018). The space is divided into nonoverlapping cells n centered at atomic positions R n , where n is the index of atoms in the system. In the vicinity of atomic site n, it is proved that the GF can be expressed as (Faulkner and Stocks, 1980;Gonis and Butler, 2000;Sébilleau, 2000;Zabloudil et al., 2006) G(r n , r n ; ǫ) = LL ′ Z n L (r n ; ǫ)τ nn LL ′ (ǫ)Z n× L ′ (r n ; ǫ) − L Z n L (r n ; ǫ)J n× L (r n ; ǫ), where L is the combined index of angular momentum quantum number l and magnetic quantum number m, r n ≡ r − R n the relative coordinate, Z n L (r n ; ǫ) and J n L (r n ; ǫ) regular and irregular solutions of the single-site problem in cell n for momentum L and energy ǫ, and τ nn LL ′ (ǫ) site-diagonal matrix elements of the scattering path operator τ nm (ǫ) in the angular momentum representation. The × symbol in Equation (4) means that we take the complex conjugate of the spherical harmonics and keep remaining parts of the function unchanged.
The scattering path operator τ nm (ǫ) describes all possible scattering events of electron states with energy ǫ between atomic sites n and m. In the angular momentum representation, the corresponding scattering path matrix is given by (Gonis and Butler, 2000;Zabloudil et al., 2006) where the underline symbol indicates matrices with respect to the angular momentum index L, t n (ǫ) is the single site scattering t-matrix associated with site n, and G nm 0 (ǫ) is the free-electron propagator matrix in the angular momentum representation, also known as KKR structure constant matrix, that describes the propagation of a free electron with energy ǫ from site n to site m. Note that we have omitted the dependence on energy ǫ in Equation (5) for a compact expression.
In the case of a finite system with N atoms, it is seen from the second equation in Equation (5) that the scattering path matrix can be computed directly by a matrix inversion: where the subscript nm on the right hand side indicates the block at the nth row and mth column of the big matrix after the inversion has been taken. In the case of periodic systems, the equation in Equation (5) for the scattering path matrix can be solved by the lattice Fourier transform, leading to (we assume that there is only one atom in the unit cell): (7) where BZ is the first Brillouin zone and G 0 (k, ǫ) is the lattice Fourier transform of G 0 (ǫ) (the double underline indicates matrices with respect to the angular momentum index and the atomic site index) (Gonis and Butler, 2000;Zabloudil et al., 2006).

LSMS Method
As described above, the MST method makes unnecessary the calculation of the Kohn-Sham orbitals, and consequently the time-consuming procedure for diagonalization and orthogonalization in the conventional KS-DFT calculations can be entirely avoided. The only global operation required by computing the GF is to obtain the scattering path matrix by an inversion of the matrix in Equation (6). Since the size of the matrix is proportional to the number of atoms in the unit cell, the MST method still suffers from cubic scaling limitation.
To reduce the scaling of the procedure, we can calculate the nth site-diagonal block of the scattering path matrix τ nn by neglecting the multiple scattering processes that involve atoms beyond some cut-off radius R LIZ from atomic site n. This is based on the observation that the scattering processes involving far away atoms have little influence on the electronic scattering behavior in the vicinity of atomic site n, which is the essence of the LSMS method. The region within distance R LIZ from the central atom is called the LIZ. If there are M atoms in the LIZ, the solution of the multiple scattering problem scales as O(NM 3 ), where N is the total number of atoms. Consequently, the LSMS method is expected to exhibit the linear scaling in N with a prefactor determined by M and the number of basis functions.

Coherent Potential Approximation
Due to the convenient access to the GF, the MST method plays a prominent role in first-principles alloy theory, in which a novel candidate is the CPA (Soven, 1967;Taylor, 1967;Gyorffy, 1972;Ruban and Abrikosov, 2008). The CPA is designed to obtain an ordered effective medium to describe properties of the multicomponent random alloy. The scattering path operator of the CPA effective medium, denoted by τ CPA , is determined by the following self-consistency condition (two-component alloy as the example): where τ A(B) is the scattering path operator of the auxiliary system constructed by replacing the central site in the ordered effective medium system by the alloy component A(B). Within the single-site approximation, it can be proved that the GF of the CPA effective medium system is equal to the targeted ensemble averaged GF (Faulkner, 1982;Ebert et al., 2011). The CPA condition in Equation (8) needs to be reformulated into a proper expression to be suited for numerical applications (Faulkner, 1982;Turek et al., 1997).

TEST ON ACCURACY
In this section, we investigate the accuracy of the FP MST method implemented in the MuST package by comparing equilibrium bulk properties and elastic constants with those calculated by the WIEN2k, EMTO, and VASP codes.

Calculation Details
In order for a meaningful comparison, we used the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional (Perdew et al., 1996) in all our calculations, and carried out convergence tests to determine the numerical parameters for each code. The relativistic effect of the core electrons was treated via the default scheme in each package. In the following, we enumerate the detailed settings of numerical parameters.

MuST
The uniform grid for the computation of the Coulomb potential was chosen as 64 × 64 × 64. The Monkhorst-Pack k-point mesh was set to be 21×21×21 in all the KKR tests. The break condition for the electronic SCF (self-consistent field) iterations was that differences in the total energy and the potential are smaller than 5 × 10 −8 Ry and 10 −7 Ry, respectively. The maximum angular momentum used in the expansion of the wave functions and the GFs was set to l max = 4. The number of radial grid points from the atomic center to the muffin-tin radius was chosen to be 2001, which is sufficiently accurate for solving the single-site scattering problem.

WIEN2k
The WIEN2k package (Blaha et al., 2020) implements an FP linearized augmented plane-wave (LAPW) method. No shape approximations have been made on the potential and charge density inside the muffin-tin spheres and in the interstitial region. In our calculations, the muffin-tin sphere radius was fixed as 2.50 Bohr, the cutoff parameter R MT · K max was chosen to be 8.00, and the plane-wave expansion cutoff G max was set as 14.00 Ry. And a 15 × 15 × 15 Monkhorst-Pack k-point mesh was used for the Brillouin zone sampling. The chosen R MT · K max and k-mesh ensure that errors in the total energies of the deformed structures are converged to 10 −4 Ry in elastic constant calculations.

EMTO
The EMTO package implements the so-called exact muffintin orbitals method, in which different from former muffin-tin methods, the single-electron states are calculated exactly for the optimized overlapping muffin-tin (OOMT) potential. We refer the readers to Vitos et al. (2001), Vitos (2001), and Vitos (2007) for the detailed theory and applications of the EMTO method. In our calculations, the EMTO basis set including s, p, d, and f orbitals was used in combination with soft-core approximation.
For the integration over energy in the complex plane, we used 24 points along a semicircular contour. The Brillouin zone was sampled by a 21 × 21 × 21 Monkhorst-Pack k-point mesh to make the total energies of the deformed structures converge up to 3 × 10 −5 Ry.

VASP
The Vienna ab initio simulation package (VASP) (Kresse and Furthmüller, 1996a,b;Kresse and Joubert, 1999) describes the electron-ion interactions by the projector-augmented wave (PAW) method. In our calculations, the kinetic energy cutoff for the plane-wave basis set was 400 eV. A 15 × 15 × 15 Monkhorst-Pack k-point mesh was used for the Brillouin zone sampling. And the SCF convergence criterion was set to be 10 −7 Ry.

Equation of State
The lattice parameter a, bulk modulus B, and pressure derivative of the bulk modulus B ′ have been commonly used for accuracy assessments of DFT codes and (pseudo)potential libraries (Kucukbenli et al., 2014;Lejaeghere et al., 2014Lejaeghere et al., , 2016. These structural properties can be extracted from the EOS for a solid. For instance, in a Morse type of EOS, the total energy is fitted by an exponential function with four parameters (D 0 , γ , a 0 , and E 0 ) Then a 0 , B 0 , and B ′ can be derived from the Morse function and used to assess the accuracy of DFT codes under investigation.
To investigate the accuracy of the FP MST method in the MuST package, we calculated a 0 , B 0 , and B ′ of three bulk systems V, Nb and Mo, and compared the results with all-electron packages including WIEN2k, EMTO, and VASP. In these tests, we employed the same exchange-correlation functional and equivalent numerical settings, as introduced in section 3.1. Figure 1 shows the calculated E(a) curves from these packages, which have been shifted so that all E(a) points with the lowest energy are adjusted to zero. The fitted results of a 0 , B 0 , and B ′ are given in Table 1. In addition, results from the Elk package, those obtained by the FP-LMTO method, and experimental values from the literature are also listed in Table 1 as a reference. The  Exp. (Frederikse, 1972;Maschke and Levy, 1983;Young, 1991;Haas et al., 2009)  Exp. (Frederikse, 1972;Ashkenazi et al., 1978;Young, 1991;Haas et al., 2009)  Exp. (Dickinson and Armstrong, 1967;Frederikse, 1972;Young, 1991;Haas et al., 2009)  differences with respect to the results from Elk are illustrated in Figure 2. We see from Table 1 and Figure 2 that except for the bulk modulus of V, differences in the calculated a and B results are less than 0.5 % and 5 %, respectively, which could be considered as small discrepancies between different codes (Holzwarth et al., 1997;Kresse and Joubert, 1999;Lejaeghere et al., 2016). The lattice parameter a of BCC V metal obtained by MuST is slightly larger than that of other ab initio methods, but the difference of a is only 0.39%, with respect to the Elk's result. For BCC Nb and Mo, both MuST lattice parameters are very close to those results from Elk. The relative error is 0.03% for Nb and 0.08% for Nb, respectively. Generally speaking, the PBE predicted lattice parameter is overestimated, that is, theoretical lattice parameter is usually larger than experimental values, whereas for V, all present ab initio lattice parameters listed in Table 1 are smaller than the experimental value at 0 K. But for Nb and Mo, the ab initio lattice parameters are slightly larger, with respect to their experimental values. The bulk modulus B represents the stress v.s. the volume expansion or compression. And its derivative B ′ can be used to describe the anharmonic effect in the vibrating lattice. Comparing the calculated bulk moduli and their derivatives, we find that for BCC V metal the MuST B is slightly smaller, within 6.9%, than the Elk bulk modulus, while EMTO, WIEN2k, and VASP results agree well with each other. This is consistent with the fact that the MuST lattice constant is slightly larger, within 0.4%, than the Elk result, whereas the relative discrepancy is within 0.2% among the results of other codes.
Finally, it is necessary to mention that the energy-lattice curve of a solid is sensitive to the treatment of semi-core states. For example, Nb has core (1s, 2s, 2p, 3s, 3p, 3d), semi-core (4s, 4p), and valence (4d, 5s) states. Due to the limitation in the current implementation of MuST, only the 4d5s electrons of Nb are considered as valence electrons, and the semi-core states are treated as core states. The same treatment is imposed for the semi-core states of V (3s, 3p) and those of Mo (4s,4p). In MST as well as in other all-electron methods including LAPW and EMTO, both the core and the valence states participate in the self-consistent iteration. The difference is that the core states are calculated using the spherical part of the crystal potential in the atomic sphere Singh and Nordström (2006). The wave function for each core state is confined and normalized within the sphere radius. In the case that semi-core states are treated as core states, since their charges are no longer well confined inside the atomic sphere, the so-called confinement error appears and a proper setting of the bounding sphere radius becomes important Asato et al. (1999). Different from MuST, in the WIEN2k calculations, a recommended separation energy of -6.0 Ry automatically defines the core-and band-states. Accordingly, both the semi-core and valence states of V, Nb, and Mo metals are treated as band states and solved using the full potential of the crystal. In the PAW method of the VASP package, the frozen-core approximation is used, so the core electrons will not participate in the selfconsistent calculations. And the PAW atomic datasets including semi-core states for V, Nb, and Mo are provided by the VASP POTCAR library to be utilized for accurate calculations. The main point is that: the differences in the treatment of the semicore states may cause noteworthy discrepancies in the calculated results, and we suggest that the semi-core states are allowed to be treated as band states in the future version of MuST.

Elastic Constants
In a cubic lattice there are three independent elastic constants, c 11 , c 12 , and c 44 , of which c 11 and c 12 are connected to the bulk modulus B = (c 11 + 2c 12 )/3 and the tetragonal shear modulus c ′ = (c 11 − c 12 )/2. The two shear elastic parameters c ′ and c 44 were computed according to the standard methodology (Vitos, 2007). For example, we used the following volume conserving orthorhombic and monoclinic deformations:  Table 1. Their differences with respect to experiments are shown in Figure 4. Due to the calculations of c 11 and c 12 via the combination of c ′ and bulk modulus, the accuracy of c ′ plays a key role in the quality of c 11 and c 12 results. From Figure 4, we can see that for the c ′ of V, the MuST result agrees well with the results from WIEN2k and VASP. Due to the small bulk modulus, our MuST calculated c 11 and c 12 are slightly different from those of WIEN2k and VASP. For c 11 and c 12 of Nb, results from MuST, WIEN2k, and VASP are all close to experiments at room temperature, whereas the difference of c ′ between calculations and experiments at 0 K is up to 13.4% for MuST, 20.2% for WIEN2k, and 16.6% for VASP. For Mo, the discrepancy of c ′ with the experimental value is up to 11.0% for MuST and EMTO, but it is only 2.9%/3.6% for VASP/WIEN2k. This results in the large difference for c 11 and c 12 between MuST/EMTO and WIEN2k/VASP calculations. Although EMTO and FP-LMTO can be regarded as similar muffin-tin type methods, their calculated elastic constants are very different. The main reason is that the available FP-LMTO results were calculated based on the LDA (Söderlind et al., 2000).
From Table 1, we can find for V and Nb that c 44 results of MuST and EMTO are close to experimental values, while those from WIEN2k and VASP much smaller. We note that the early work on elastic constants c 44 is 17.1 GPa for V and 10.3 GPa for Nb (Koči et al., 2008;Nagasako et al., 2010). There is an anomalous dispersion of transverse acoustic phonons propagating along the <100> direction in V and Nb. Softening of acoustic phonons induces small values of the shear constant. The soft acoustic phonons and small shear constants are related to the nesting properties of the Fermi surface, which produce a van Hove singularity in the electronic DOS near the Fermi level (Landa et al., 2006b;Nagasako et al., 2010). Due to the presence of van Hove singularity, an extremely fine mesh for Brillouin zone integration suggested in Nagasako et al. (2010) was expected to determine the exact c 44 . However, in practice, the convergence of c 44 with respect to the k-point density may be very slow (Nagasako et al., 2010). Instead of using an extremely dense kmesh, the smearing methods can be used to handle the singularity in DOS. It is reported in Nagasako et al. (2010) that the smearing method has a clear impact on the c 44 results. We note that smearing is performed in WIEN2k and VASP calculations, but in the MuST and EMTO codes no smearing methods are used. This might be the reason on the discrepancy between theoretical results. For Mo, the MuST calculated c 44 is 18.7% larger than the experimental value at 0 K, but the c 44 from WIEN2k and VASP are in good agreement with experiments. The ultimate reason of differences in the elastic constant between ab initio calculations and experiments is still far from resolved. We noted that there exist variations between experiment results, for example, for Mo the experimental value of c 44 is about 110.9 GPa at 0 K reported in Dickinson and Armstrong (1967), while another experiment is about 125 GPa at 0 K (Featherston and Neighbours, 1963). So it may be necessary to estimate the accuracy of experiments at low temperatures and the improved extrapolation method may also be desirable.

TEST ON SCALABILITY
In this section, we investigate the strong and weak scalabilities of the FP LSMS method implemented in MuST package.

Convergence on LIZ Size
In practice, the first question on the LSMS method may be the proper choice of the LIZ size for an atomic site. We can calculate the total energy of the bulk system using the LSMS method with increasing LIZ sizes and compare the results with those obtained by the standard MST method. The convergence tests on the LIZ radius have been performed for face-centered cubic (FCC) Cu and BCC Mo in Faulkner et al. (2018). For FCC Cu, the LSMS energy agrees with the reference MST result to better than 0.5 mRy when 6 neighboring shells are included in the LIZ. This corresponds to a cluster of 87 atomic sites with a LIZ radius of 11.7 Bohr. For BCC Mo, on the other hand, a larger LIZ is required in order to achieve better than 0.5 mRy accuracy. We test the convergence of the LIZ radius for BCC Nb and its deformed structures. As shown in Figure 5, the LSMS total energies converge to the MST results when the LIZ radius is larger than 20 Bohr. Indeed, we have achieved the accuracy of 0.5 mRy by including 14 neighboring shells into the LIZ, which corresponds to about 330 atoms. This is due to the fact that the Fermi energy falls in the d bands so that the DOS near the Fermi energy is significant.
We would like to mention that in the MuST code, the LIZ is embedded in the vacuum with free-electron GF. The LIZ size may be effectively decreased by choosing the effective medium instead (Zeller et al., 1995;Abrikosov et al., 1997), which could provide clues for improving the performance of the LSMS method in a future release.

Strong Scalability
The complexity of the FP MST method can be estimated by the weak scaling test. A good strong scalability is a prerequisite to an effective weak scaling test. Under the premise of good strong scalability, the computational overhead can be revealed by the execution time since the communication overhead contributes a small percentage. We constructed a BCC supercell consisting of 1024 niobium (Nb) atoms. The LIZ of each atomic site contains 89 atoms. As illustrated in Table 2, the LSMS method exhibits a good strong scalability. This is due to the two-level parallelism over atoms and energy points implemented in MuST package. The 1024 atoms are distributed over from 128 to 1024 MPI (message passing interface) processes. When the number of MPI processes exceeds the number of atoms, a second level of parallelization over energy points is performed.
The intrinsic parallelism comes from the fact that the computation of the GF for each atom and each energy point along the complex contour is essentially independent. Each MPI process exchanges t-matrix with the others treating the neighboring atoms in the LIZ region. There are no global operations involved in the process of calculating the GF other than few global sum operations such as the summation of the net charge in each atomic cell for the determination of the electron chemical potential. Consequently, the MST method can be highly parallelized, as shown in Figure 6.

Weak Scalability
In the weak scaling test, the system size and the number of MPI processes are increased concurrently when one atom per MPI process is kept unchanged. In the pure atom parallelization, we observe a significant growth in the execution time with increasing atoms, as shown in Table 3. Consequently, the overall complexity  of the FP MST method is not O(N). The execution time of one SCF iteration can be divided into two parts. One part is for the solution of the valence state by KKR-GF method. The other is used to update the effective potential. They are denoted by t val and t pot in Table 3. It can be observed that t val remains almost the same while t pot grows with increasing system size. So the linear scaling is achieved in solving the GF function, which is consistent with the tests in Thiess et al. (2012). As the system size becomes large, the computational overhead on updating the effective potential becomes gradually dominant, which deserves further analysis.

Scaling Analysis for Updating Potential
As shown in Table 3, the execution time on updating the exchange-correlation potential, denoted by t xc , remains almost the same as system size increases. Therefore, we concentrate on the Coulomb potential. In the FP method, the total charge density is divided into the following two parts: whereρ is chosen as a smoothly varying density andρ is the sphere-bounding non-overlapping charge density. The associated Coulomb potential withρ can be formulated as likê which can be calculated by the multi-pole expansion technique together with the periodic boundary condition and the constraint The procedure is somewhat analogous to the calculation of the Coulomb potential in muffin-tin approximation. The difference is that the non-spherical potential in FP method has multi-pole expansion while the spherical one in muffin-tin approximation has only zero-order moment. Actually, both the two schemes have linear scaling. The charge densityρ can be regarded as a pseudo electron density varying smoothly. The associated Coulomb potential can be determined by solving the Poisson equation: And fast Fourier transform (FFT) is used for solving Equation (13). In the MST method, both the electron density and oneelectron potential are discretized on the spherical mesh around each atom. Therefore, an interpolation from the uniform FFT mesh to the spherical mesh is required. More specifically, the radial partṼ L is calculated fromρ on the uniform FFT grids. The computational scheme can be formulated as the integral form: where F L (r j , r ′ j ) is defined as follows: where j l is the spherical Bessel function and Y L is the spherical harmonic. In Equations (14) and (15), r j stands for the radial grid point, r ′ and G represent the real-space FFT grid and the corresponding reciprocal grid, respectively, and r ′ j = |r ′ − R j |. Since F L (r j , r ′ j ) is independent of the electron density, it can be setup once before the SCF iteration. However, the summation in Equation (14) scales as N rad · N · N FFT , where N rad is the number of radial grids and set to be 2001 in the test. And N FFT is proportional to the number of atoms N, which yields O(N 2 ) scaling in performing the interpolation like Equation (14). We locate the code segment to perform (14) and denote its execution time by t * in Table 3. As shown in Figure 7, the calculations of valence states scale as O(N), while the interpolation from FFT uniform grid to the atom-centered radial grid exhibits an O(N 2 ) scaling, which results in an overall scaling of O(N 1.6 ). Therefore, a more efficient algorithm to update the Coulomb potential in angular momentum expansion is critical to achieve a linear scaling FP MST method.

CONCLUSION
We have investigated the accuracy and scalability of the FP MST method implemented in MuST package. The MST predicted lattice parameter for V, Nb, and Mo are consistent with the other calculations and the available experiments. The MST predicted bulk moduli, pressure derivative of the bulk modulus, and the c ′ elastic constant are acceptable, expect for a relatively larger difference in the bulk modulus of V. While for c 44 , there exists large difference between theoretical and experimental results, the possible reasons have been discussed in details. It is suggested that a proper treatment of the semi-core states should be considered in the future version of the MuST package.
A significant advantage of the MST method is the reduced scaling in the calculations of metallic systems. Although the linear scaling has been reported previously under the muffin-tin approximation, tests in this work imply that the overall scaling of the FP method is not O(N). It is suggested that the updating of the Coulomb potential in angular momentum expansion should be further improved. Nevertheless, a favorable scaling as O(N 1.6 ) can be achieved in the full-potential MST method, compared to the O(N 2 ) to O(N 3 ) scaling of frequently-used methods. Another advantages in MuST is the treatment of chemical and magnetic disorders based on the CPA.
In summary, the FP MST method shows the potential to simulate more complicated materials on massively parallel supercomputers. And MuST provides a reliable and accessible way to large-scale first-principle simulations of metals and alloys.

DATA AVAILABILITY STATEMENT
All datasets presented in this study are included in the article/supplementary material.